Building Space-Efficient Inverted Indexes on Low-Cardinality Dimensions

نویسندگان

  • Vasilis Spyropoulos
  • Yannis Kotidis
چکیده

Many modern applications naturally lead to the implementation of inverted indexes for effectively managing large collections of data items. Creating an inverted index on a low cardinality data domain results in replication of data descriptors, leading to increased storage overhead. For example, the use of RFID or similar sensing devices in supply-chains results in massive tracking datasets that need effective spatial or spatio-temporal indexes on them. As the volume of data grows proportionally larger than the number of spatial locations or time epochs, it is unavoidable that many of the resulting lists share large subsets of common items. In this paper we present techniques that exploit this characteristic of modern big-data applications in order to losslessly compress the resulting inverted indexes by discovering large common item sets and adapting the index so as to store just one copy of them. We apply our method in the supply chain domain using modern big-data tools and show that our techniques in many cases achieve compression ratios that exceed 50%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Skycube Computation Using Bitmaps Derived from Indexes

TAMBARAM KAILASAM, GAYATHRI. Efficient Skycube Computation using Bitmaps derived from Indexes. (Under the direction of Dr. Jaewoo Kang.) Skyline queries have been increasingly used in multi-criteria decision making and data mining applications. They retrieve a set of interesting points from a potentially large set of data points. A point is said to be interesting if it is as good or better in a...

متن کامل

Skylight Design Regulation for Residential Building in Hamadan City

Skylights or light wells are an integral part of the design of low- and high-depth buildings. The design of these skylights in different areas is based on specific criteria. According to Hamedan's criteria, only the dimensions of these skylights and the ratio of skylight area to height of the skylight are enough to design skylights. The purpose of this study was to evaluate the accuracy of sk...

متن کامل

Efficient Phrase Querying with an Auxiliary Index

Search engines need to evaluate queries extremely fast, a challenging task given the vast quantities of data being indexed. A significant proportion of the queries posed to search engines involve phrases. In this paper we consider how phrase queries can be efficiently supported with low disk overheads. Previous research has shown that phrase queries can be rapidly evaluated using nextword index...

متن کامل

Efficient Phrase Querying with an Auxiliary Index

Search engines need to evaluate queries extremely fast, a challenging task given the vast quantities of data being indexed. A significant proportion of the queries posed to search engines involve phrases. In this paper we consider how phrase queries can be efficiently supported with low disk overheads. Previous research has shown that phrase queries can be rapidly evaluated using nextword index...

متن کامل

A retrieval technique for high-dimensional data and partially specified queries

While the persistent data of many advanced database applications, such as OLAP and scientific studies, are characterized by very high dimensionality, typical queries posed on these data appeal to a small number of relevant dimensions. Unfortunately, the multi-dimensional access methods designed for high-dimensional data perform rather poorly for these partially specified queries. The retrieval ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015